448 research outputs found

    PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions

    Get PDF
    As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species _Drosophila_ genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    GIVE: portable genome browsers for personal websites.

    Get PDF
    Growing popularity and diversity of genomic data demand portable and versatile genome browsers. Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, "linear" quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user's personal website or saved and shared with collaborators. GIVE is available at: https://www.givengine.org/

    Tiling array data analysis: a multiscale approach using wavelets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tiling array data is hard to interpret due to noise. The wavelet transformation is a widely used technique in signal processing for elucidating the true signal from noisy data. Consequently, we attempted to denoise representative tiling array datasets for ChIP-chip experiments using wavelets. In doing this, we used specific wavelet basis functions, <it>Coiflets</it>, since their triangular shape closely resembles the expected profiles of true ChIP-chip peaks.</p> <p>Results</p> <p>In our wavelet-transformed data, we observed that noise tends to be confined to small scales while the useful signal-of-interest spans multiple large scales. We were also able to show that wavelet coefficients due to non-specific cross-hybridization follow a log-normal distribution, and we used this fact in developing a thresholding procedure. In particular, wavelets allow one to set an unambiguous, absolute threshold, which has been hard to define in ChIP-chip experiments. One can set this threshold by requiring a similar confidence level at different length-scales of the transformed signal. We applied our algorithm to a number of representative ChIP-chip data sets, including those of Pol II and histone modifications, which have a diverse distribution of length-scales of biochemical activity, including some broad peaks.</p> <p>Conclusions</p> <p>Finally, we benchmarked our method in comparison to other approaches for scoring ChIP-chip data using spike-ins on the ENCODE Nimblegen tiling array. This comparison demonstrated excellent performance, with wavelets getting the best overall score.</p

    Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

    Get PDF
    Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

    Evaluation of two commercial global miRNA expression profiling platforms for detection of less abundant miRNAs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>microRNAs (miRNA) are short, endogenous transcripts that negatively regulate the expression of specific mRNA targets. miRNAs are found both in tissues and body fluids such as plasma. A major perspective for the use of miRNAs in the clinical setting is as diagnostic plasma markers for neoplasia. While miRNAs are abundant in tissues, they are often scarce in plasma. For quantification of miRNA in plasma it is therefore of importance to use a platform with high sensitivity and linear performance in the low concentration range. This motivated us to evaluate the performance of three commonly used commercial miRNA quantification platforms: GeneChip miRNA 2.0 Array, miRCURY Ready-to-Use PCR, Human panel I+II V1.M, and TaqMan Human MicroRNA Array v3.0.</p> <p>Results</p> <p>Using synthetic miRNA samples and plasma RNA samples spiked with different ratios of 174 synthetic miRNAs we assessed the performance characteristics reproducibility, recovery, specificity, sensitivity and linearity. It was found that while the qRT-PCR based platforms were sufficiently sensitive to reproducibly detect miRNAs at the abundance levels found in human plasma, the array based platform was not. At high miRNA levels both qRT-PCR based platforms performed well in terms of specificity, reproducibility and recovery. At low miRNA levels, as in plasma, the miRCURY platform showed better sensitivity and linearity than the TaqMan platform.</p> <p>Conclusion</p> <p>For profiling clinical samples with low miRNA abundance, such as plasma samples, the miRCURY platform with its better sensitivity and linearity would probably be superior.</p

    miR-22 Forms a Regulatory Loop in PTEN/AKT Pathway and Modulates Signaling Kinetics

    Get PDF
    Background: The tumor suppressor PTEN (phosphatase and tensin homolog) is a lipid phosphatase that converts PIP3 into PIP2 and downregulates the kinase AKT and its proliferative and anti-apoptotic activities. The FoxO transcription factors are PTEN downstream effectors whose activity is negatively regulated by AKT-mediated phosphorylation. PTEN activity is frequently lost in many types of cancer, leading to increased cell survival and cell cycle progression. Principal Findings: Here we characterize the widely expressed miR-22 and report that miR-22 is a novel regulatory molecule in the PTEN/AKT pathway. miR-22 downregulates PTEN levels acting directly through a specific site on PTEN 39UTR. Interestingly, miR-22 itself is upregulated by AKT, suggesting that miR-22 forms a feed-forward circuit in this pathway. Timeresolved live imaging of AKT-dependent FoxO1 phosphorylation revealed that miR-22 accelerated AKT activity upon growth factor stimulation, and attenuated its down regulation by serum withdrawal. Conclusions: Our results suggest that miR-22 acts to fine-tune the dynamics of PTEN/AKT/FoxO1 pathway

    Comprehensive comparative analysis of strand-specific RNA sequencing methods

    Get PDF
    Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.Howard Hughes Medical InstituteUnited States-Israel Binational Science Foundatio
    corecore